There has been great recent advancement in human-computer chat. However, proper evaluation currently requires human judgements that produce notoriously high-variance metrics due to their inherent subjectivity. Furthermore, there is little standardization in the methods and labels used for evaluation, with an overall lack of work to compare and assess the validity of various evaluation approaches. As a consequence, existing evaluation results likely leave an incomplete picture of the strengths and weaknesses of open-domain chatbots. We aim towards a dimensional evaluation of human-computer chat that can reliably measure several distinct aspects of chat quality. To this end, we present our novel human evaluation method that quantifies the rate of several quality-related chatbot behaviors. Our results demonstrate our method to be more suitable for dimensional chat evaluation than alternative likert-style or comparative methods. We then use our validated method and existing methods to evaluate four open-domain chat models from the recent literature.
translated by 谷歌翻译
Metaverse over wireless networks is an emerging use case of the sixth generation (6G) wireless systems, posing unprecedented challenges in terms of its multi-modal data transmissions with stringent latency and reliability requirements. Towards enabling this wireless metaverse, in this article we propose a novel semantic communication (SC) framework by decomposing the metaverse into human/machine agent-specific semantic multiverses (SMs). An SM stored at each agent comprises a semantic encoder and a generator, leveraging recent advances in generative artificial intelligence (AI). To improve communication efficiency, the encoder learns the semantic representations (SRs) of multi-modal data, while the generator learns how to manipulate them for locally rendering scenes and interactions in the metaverse. Since these learned SMs are biased towards local environments, their success hinges on synchronizing heterogeneous SMs in the background while communicating SRs in the foreground, turning the wireless metaverse problem into the problem of semantic multiverse communication (SMC). Based on this SMC architecture, we propose several promising algorithmic and analytic tools for modeling and designing SMC, ranging from distributed learning and multi-agent reinforcement learning (MARL) to signaling games and symbolic AI.
translated by 谷歌翻译
经典的媒体访问控制(MAC)协议是可解释的,但是它们的任务不可能控制信号传导消息(CMS)不适合新兴任务 - 关键任务应用程序。相比之下,基于神经网络(NN)协议模型(NPM)学会生成特定于任务的CMS,但其理由和影响缺乏可解释性。为了填补这一空白,在本文中,我们首次提出了通过将NPM转换为概率逻辑编程语言(ProBlog)编写的可解释的符号图来构建的语义协议模型(SPM)。通过在将NPM视为CM发生器的同时提取和合并共同的CM及其连接,可以可行。通过广泛的模拟,我们证实了SPM在仅占据0.02%内存的同时紧密近似其原始NPM。通过利用其可解释性和记忆效率,我们演示了几种支持SPM的应用程序,例如SPM重新配置,以避免碰撞,并通过语义熵计算和存储多个SPM来比较不同的SPM,以应对非平稳环境。
translated by 谷歌翻译
近年来,与私人数据的分散学习领域有很大进展。联合学习(FL)和分裂学习(SL)是两个拥有其优点和缺点的矛头,并分别适用于许多用户客户和大型型号。为了享受这两个好处,斯普利特这样的混合方法已经出现了迟到,但他们的基本面仍然是虚幻的。在这项工作中,我们首先识别SL的基本瓶颈,从而提出可伸缩的SL框架,被卷曲的SGLR。 SGLR下的服务器在分裂层上广播了平均的公共梯度,在没有横跨客户端的情况下仿真FL而没有任何额外的通信。同时,SGLR将学习率分解为服务器端和客户端速率,并单独调整它们以支持许多客户端。仿真结果证实了SGLR实现比其他基线SL方法更高的精度,包括分裂,这甚至是与耗能更高的能量和通信成本的影响。作为次要结果,我们通过使用SLGR通过基线通过相互信息观察更大的敏感信息泄漏。
translated by 谷歌翻译
数据增强是自然语言处理(NLP)模型的鲁棒性评估的重要组成部分,以及增强他们培训的数据的多样性。在本文中,我们呈现NL-Cogmenter,这是一种新的参与式Python的自然语言增强框架,它支持创建两个转换(对数据的修改)和过滤器(根据特定功能的数据拆分)。我们描述了框架和初始的117个变换和23个过滤器,用于各种自然语言任务。我们通过使用其几个转换来分析流行自然语言模型的鲁棒性来证明NL-Upmenter的功效。基础架构,Datacards和稳健性分析结果在NL-Augmenter存储库上公开可用(\ url {https://github.com/gem-benchmark/nl-augmenter})。
translated by 谷歌翻译
我们通过纳入通用依赖性(UD)的句法特征来瞄准直接零射击设置中的跨语言机器阅读理解(MRC)的任务,以及我们使用的关键功能是每个句子中的语法关系。虽然以前的工作已经证明了有效的语法引导MRC模型,但我们建议采用句子际句法关系,除了基本的句子关系外,还可以进一步利用MRC任务的多句子输入中的句法依赖性。在我们的方法中,我们构建了句子间依赖图(ISDG)连接依赖树以形成横跨句子的全局句法关系。然后,我们提出了编码全局依赖关系图的ISDG编码器,通过明确地通过一个跳和多跳依赖性路径来解决句子间关系。三个多语言MRC数据集(XQUAD,MLQA,Tydiqa-Goldp)的实验表明,我们仅对英语培训的编码器能够在涵盖8种语言的所有14个测试集中提高零射性能,最高可达3.8 F1 / 5.2 EM平均改善,以及某些语言的5.2 F1 / 11.2 em。进一步的分析表明,改进可以归因于跨语言上一致的句法路径上的注意力。
translated by 谷歌翻译
模型量化被称为一个有前途的方法来压缩深神经网络,特别是用于在轻量级移动或边缘设备的推论。然而,模型量化通常需要访问原始训练数据,以保持完整的精密模型的精度,这是真实世界的场景对安全和隐私问题往往是不可行的。在不访问原始数据执行量化一种流行的方法是使用合成产生的样品,基于分批的正规化统计或学习对抗性。然而,这些方法的缺点在于,它们主要依靠随机噪声输入到所述发电机以达到合成样品的多样性。我们发现,这往往是不足以捕捉原始数据的分布,特别是在决策边界。为此,我们提出Qimera,一种方法,其使用叠加潜的嵌入以产生合成的边界支撑样品。对于叠加的嵌入,以更好地反映原始分布,我们也建议使用额外的解开映射层和提取全精度模型的信息。实验结果表明,Qimera实现国家的最先进的演出上免费的数据量化的各种设置。代码可在https://github.com/iamkanghyunchoi/qimera。
translated by 谷歌翻译
提高对话系统的用户体验通常需要密集的开发人员努力读取对话日志,运行统计分析,并激活系统缺点的相对重要性。本文介绍了一种自动分析对话日志的新方法,了解用户系统交互与总体对话质量之间的关系。与在话语级别质量预测上的事先工作不同,我们的方法了解每个互动的影响,没有话语级注释的整体用户评级,允许基于经验证据和低成本获得所得模型结论。我们的模型识别与Chatbot设置中的与整体对话质量有着强烈相关的交互。实验表明,我们模型的自动分析同意专家判决,使这项工作首先表明这种弱监督的话语级质量预测学习是高度可取的。
translated by 谷歌翻译
我们展示了一个基于逻辑推理的新型对话管理方法的聊天栏。除了帧对话一系列响应生成任务,我们将对话作为协作推断过程,其中扬声器共享信息以实时地合成新知识。我们的Chatbot管道在三个广泛的阶段完成了这种建模。第一阶段将用户话语转换为符号谓词表示。然后,第二阶段与更大的知识库结合使用这种结构化表示来合成使用有效的图形匹配来扫描新谓词。在第三阶段和最后阶段,我们的机器人选择一个小的谓词子集并将它们转化为英语响应。这种方法为了解用户输入的潜在语义,灵活的主动措施以及与对话背景相干的响应。
translated by 谷歌翻译
Many recent works on understanding deep learning try to quantify how much individual data instances influence the optimization and generalization of a model, either by analyzing the behavior of the model during training or by measuring the performance gap of the model when the instance is removed from the dataset. Such approaches reveal characteristics and importance of individual instances, which may provide useful information in diagnosing and improving deep learning. However, most of the existing works on data valuation require actual training of a model, which often demands high-computational cost. In this paper, we provide a training-free data valuation score, called complexity-gap score, which is a data-centric score to quantify the influence of individual instances in generalization of two-layer overparameterized neural networks. The proposed score can quantify irregularity of the instances and measure how much each data instance contributes in the total movement of the network parameters during training. We theoretically analyze and empirically demonstrate the effectiveness of the complexity-gap score in finding 'irregular or mislabeled' data instances, and also provide applications of the score in analyzing datasets and diagnosing training dynamics.
translated by 谷歌翻译